A Relevant Content Filtering Based Framework for Data Stream Summarization

نویسندگان

  • Cailing Dong
  • Arvind Agarwal
چکیده

Social media platforms are a rich source of information these days, however, of all the available information, only a small fraction is of users’ interest. To help users catch up with the latest topics of their interests from the large amount of information available in social media, we present a relevant content filtering based framework for data stream summarization. More specifically, given the topic or event of interest, this framework can dynamically discover and filter out relevant information from irrelevant information in the stream of text provided by social media platforms. It then further captures the most representative and up-to-date information to generate a sequential summary or event story line along with the evolution of the topic or event. Our framework does not depend on any labeled data, it instead uses the weak supervision provided by the user, which matches the real scenarios of users searching for information about an ongoing event. We experimented on two real events traced by a Twitter dataset from TREC 2011. The results verified the effectiveness of relevant content filtering and sequential summary generation of the proposed framework. It also shows its robustness of using the most easy-to-obtain weak supervision, i.e., trending topic or hashtag. Thus, this framework can be easily integrated into social media platforms such as Twitter to generate sequential summaries for the events of interest. We also make the manually generated gold-standard sequential summaries of the two test events publicly available for future use in the community.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A survey on Automatic Text Summarization

Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...

متن کامل

Systematic literature review of fuzzy logic based text summarization

Information Overloadrq  is not a new term but with the massive development in technology which enables anytime, anywhere, easy and unlimited access; participation & publishing of information has consequently escalated its impact. Assisting userslq    informational searches with reduced reading surfing time by extracting and evaluating accurate, authentic & relevant information are the primary c...

متن کامل

NOVASearch at TREC 2017 Real-Time Summarization Track

The rise of large data streams introduces new challenges regarding the delivery of relevant content towards an information need. This information need can be seen as a broad topic of information. One possible strategy to tackle the delivery of the most relevant documents regarding this broader topic is summarization. TREC 2017 Real-Time Summarization (RTS) provides a testbed for the development...

متن کامل

IRIT at TREC Real Time Summarization 2016

This paper presents the participation of the IRIT laboratory (University of Toulouse) to the Real Time Summarization track of TREC 2016. This track consists in a real-time filtering the tweet stream and identifying both relevant and novel tweets to be pushed to user in real-time. Our team proposes three different approaches: (1) The first approach consist of a filtering model that combines seve...

متن کامل

WaterlooClarke: TREC 2015 Temporal Summarization Track

The Temporal Summarization Track looks at providing meaningful summaries of major events and sub-events as they occur. Difficulties arise due to the unique nature of the temporal summarization task in which the corpora is constantly changing along with the known information about the event [1]. This year, the temporal summarization track consists of three tasks, two filtering and summarization ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016